8 research outputs found

    Beyond original Research Articles Categorization via NLP

    Full text link
    This work proposes a novel approach to text categorization -- for unknown categories -- in the context of scientific literature, using Natural Language Processing techniques. The study leverages the power of pre-trained language models, specifically SciBERT, to extract meaningful representations of abstracts from the ArXiv dataset. Text categorization is performed using the K-Means algorithm, and the optimal number of clusters is determined based on the Silhouette score. The results demonstrate that the proposed approach captures subject information more effectively than the traditional arXiv labeling system, leading to improved text categorization. The approach offers potential for better navigation and recommendation systems in the rapidly growing landscape of scientific research literature.Comment: Workshop on Human-in-the-Loop Applied Machine Learning (HITLAML), 202

    The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection

    Full text link
    Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set

    Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data

    Full text link
    We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement. Results show that articulatory features generated by this approach can correlate up to 0.59 Pearson product-moment correlation with measured articulatory features.Comment: IEEE Workshop on Spoken Language Technology (SLT

    An overview of data integration in neuroscience with focus on Alzheimer's Disease

    Get PDF
    : This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities. Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis

    On Deep Learning strategies to address Automatic Speech Recognition (ASR) for dysarthric speech

    Get PDF
    This thesis explores deep learning techniques to improve Automatic Speech Recognition (ASR) for people affected by dysarthria. Dysarthria is a widely spread motor disorder causing high speech unintelligibility and, often, also motor control abnormalities. Hence, ASR-based technologies may represent the only possibility for dysarthric individuals to interact with other people or machines. Unfortunately, traditional ASR systems fail in presence of dysarthric speech. For instance, we tested Google Speech API and IBM on a subset of the TORGO dataset. These provide more than 80% of WER, while the human error is 30%. One of the main issues is that the ASR model cannot capture the inter-speaker variability, due to the small and limited dysarthric speech corpora. To overcome this issue, we propose three possible strategies. Firstly, we investigate the use of speech production knowledge as additional information in the ASR system. As the articulatory features (AFs) are difficult to collect, especially for dysarthric speakers, we move a step backward and first study deep learning based methods to synthesize AFs for audio-only corpora. Specifically, we propose the use of phonetic features in addition or substitution to the acoustic ones in the standard Acoustic Inversion (AI) mapping, with the aim of improving its generalization across datasets. Then, we introduce unsupervised methods to synthesize AFs that leverage phonetic features, to extract raw articulatory information, and acoustic vectors, to capture complex phenomena such as the coarticulation. We finally integrate the synthetic AFs as secondary target or additional input in the ASR model. After a preliminary study on the TIMIT corpus showing encouraging results on phone classification, we evaluate the ASR performance on CHiME-4. The first and the second strategies provide a Word Error Rate (WER) relative reduction of 1.9% and 5.4%, respectively, over the traditional ASR system. Secondly, we consider the scenario in which we have access to multiple labelled datasets (sources) and we want to learn a classifier for an unlabelled dataset (target). Such a problem is known in machine learning as multi-source domain adaptation. We propose an algorithm, named Multi-Source Domain Adaptation via Weighted Joint Optimal Transport (MSDA-WDJOT), that aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions, based on their similarity with the target distribution. We then employ MSDA-WJDOT in two real-world applications: dysarthria detection and spoken command recognition. In the first case, we assume to have multiple labelled noisy datasets containing dysarthric and healthy speech and we adopt MSDA-WJDOT to learn a binary classifier for an unlabelled noisy dataset. The proposed approach outperforms all the competitor models, improving the detection accuracy of 0.9% over the best one. In the second case, MSDA-WJDOT is used to perform dysarthric speaker adaptation in a voice command recognition system. This provides an accuracy relative improving of 21% and 12% over the baseline and the best competitor model, respectively. Finally, we focus on contexts in which only a small vocabulary needs to be recognized. This allows to simplify the problem to spoken command recognition. Towards this direction, we collected a dysarthric speech corpus containing commands related to the task of making a call. This is the richest Italian dysarthric speech corpus to date and it can be used to train a Command Recognizer and develop a smartphone Contact application. Last but not least, we introduce the AllSpeak project in which it has been developed an Android application for people affected by Amyotrophic Lateral Sclerosis. Specifically, this App is based on a Voice Command Recognition that recognizes commands related to basic needs even when the speech intelligibility is almost vanished.L'obiettivo di questa tesi è quello di sviluppare tecniche di deep learning per sistemi di riconoscimento vocale (RV) per persone affette da disartria. La disartria è un disordine motorio che comporta una compromissione della comunicazione verbale e difficoltà nel controllo motorio. Perciò, dispositivi basati sul riconoscimento vocale potrebbero rappresentare l'unica possibilità per soggetti disartrici di interagire con il mondo esterno. Tuttavia, le tecnologie oggi disponibili risultano inadeguate. Ad esempio, testando Google Speech API e IBM sul dataset TORGO, abbiamo ottenuto un Word Error Rate (WER) superiore all'80%. Poiché le caratteristiche del parlato dipendono dalla gravità e dal tipo di disartria, un modello di RV ideale dovrebbe essere addestrato su grandi dataset di parlato disartrico. Sfortunatamente, questi dati sono difficili da acquisire e sono oggi disponibili solo pochi e limitati dataset. Al fine di risolvere questo problema, presentiamo qui tre possibili strategie. La prima strategia si basa sull'integrazione di informazioni sulla dinamica del tratto vocale nei sistemi di RV. Poiché la misurazione articolatoria è piuttosto invasiva e costosa, ci siamo prima soffermati su metodi per la sintesi di feature articolatorie (FA), a partire da feature fonetiche. Più precisamente, proponiamo l'uso delle feature fonetiche in aggiunta o in sostituzione a quelle acustiche nell'Acoustic Inversion (AI) map, con lo scopo di migliorare la generalizzazione a nuovi dataset. Successivamente, abbiamo introdotto metodi non supervisionati che combinano feature fonetiche, contenenti informazioni articolatorie grezze, e feature acustiche, contenenti informazioni sulla co-articolazione, per sintetizzare FA. Infine, abbiamo esplorato due strategie per integrare le FA sintetizzate nei sistemi di RV. Dopo aver ottenuto incoraggianti risultati sul dataset TIMIT sulla classificazione di fonemi, abbiamo testato le performance del sistema di RV sul dataset CHiME-4. Le due strategie hanno portato a una riduzione relativa del WER di 1.9% e 5.4%, rispettivamente.\\ La seconda strategia si basa sul multi-source domain adaptation (MSDA), in cui vengono sfruttati dataset sorgente per apprendere un classificatore per un dataset target. L'algoritmo proposto, MSDA Weighted Joint Distribution Optimal Transport (MSDA-WJDOT), è ottimizzato per trovare il miglior allineamento, basato sul Trasporto Ottimo, tra la distribuzione di probabilità del target e una combinazione convessa di quelle sorgente. Tale combinazione è pesata da un coefficiente che viene appreso in base alla distanza tra le distribuzioni sorgente e target. Abbiamo utilizzato poi questo algoritmo in due applicazioni. Nel primo caso, abbiamo adottato MSDA-WJDOT per imparare una funzione in grado di diagnosticare la disartria. In questo caso, sia i dataset sorgente che quello target sono dataset di parlato disartrico e normale, contenente ciascuno un diverso tipo di rumore. La seconda applicazione riguarda invece l'adattamento di un sistema di riconoscimento di comandi vocali a uno speaker disartrico. In entrambi i casi, MSDA-WJDOT ha ottenuto performance migliori sia della baseline che di altri metodi per il MSDA. In ultimo, ci siamo focalizzati su contesti in cui è sufficiente il riconoscimento di un vocabolario limitato. Questo permette di semplificare il problema riducendolo al riconoscimento di comandi vocali. Lavorando in questa direzione, abbiamo acquisito un dataset di parlato disartrico contenente comandi relativi all'uso di un'applicazione Rubrica per smartphone. Queste registrazioni costituiscono il più esteso dataset di parlato disartrico in italiano. Infine, introduciamo il progetto AllSpeak in cui è stata sviluppata un'applicazione Android basata sul riconoscimento di comandi vocali. Questa permette a soggetti affetti da Sclerosi Laterale Amiotrofica (SLA) di comunicare i loro bisogni primari anche quando il loro parlato è a stento intelligibile

    Un'interpretazione del modello classico di Retinex sulla base della funzionalita del LGN

    Get PDF
    In questa tesi consideriamo il problema della percezione di immagini, e in particolare la sensibilità al contrasto del nostro sistema visivo. Viene studiato il modello classico di Retinex, che descrive l'immagine percepita come soluzione di un'equazione di Poisson. Questo modello viene reinterpretato utilizzando strumenti di geometria differenziale e derivate covarianti. La controparte neurofisiologica del modello è la descrizione della funzionalità del LGN, e della connettività che le lega. Questa viene modellata come un nucleo soluzione fondamentale dell'equazione di Laplace, con strumenti di teoria delle distribuzioni. L'attività dello strato di cellule è quindi soluzione dell'equazione di Laplace, ovvero la stessa equazione che descrive il Retinex. Questo prova che le cellule sono responsabili della percezione a meno di illuminazione

    Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport

    No full text
    The problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets is becoming increasingly important. A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains. In this paper, we address this problem from a new perspective: instead of looking for a latent representation invariant between source and target domains, we exploit the diversity of source distributions by tuning their weights to the target task at hand. Our method, named Weighted Joint Distribution Optimal Transport (WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real-life datasets

    Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport

    No full text
    International audienceThe problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets is becoming increasingly important. A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains. In this paper, we address this problem from a new perspective: instead of looking for a latent representation invariant between source and target domains, we exploit the diversity of source distributions by tuning their weights to the target task at hand. Our method, named Weighted Joint Distribution Optimal Transport (WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real-life datasets
    corecore